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•  ami -Harkov  regenerative  stopping  problaa  with  a  finite  maabar  of  continue 
act  tone  by  solving  a  sequence  of  stopping  problems.  New  results  for  the  opti 
■al  stopping  problem  are  obtained  aa  well  as  for  the  regenerative  stopping 
problem.  Two  models  In  the  literature  arc  used  as  detailed  examples  of  the 
algorithm. 
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I.  lot roductlon 


Perhaps  tha  aoat  lntaraatlnt  raault  la  tha  thaory  of  optlaal  stopping  froa 
•  computational  standpoint  la  tha  aonotona  atopplng  thaoraa  (Chow,  Robbins* 
Slegaund  [6,  Thaoraa  3.3],  Ross  {14,  Thaoraa  6.14],  Abdel-Haaeed  11]).  That 
thaoraa,  which  glvaa  conditions  for  which  a  myopic  policy  la  optlaal,  la  re- 
cooaldarad  la  tha  next  sactlon.  Tha  aaln  two  dlffarancas  froa  aarllsr  versions 
ara  a  aora  ganaral  coat  atructura  and  paralttlng  a  finite  nuaber  of  continue 
actions  rathar  than  one  continue  action.  Our  motivation  for  reconsidering  the 
aonotona  atopplng  thaoraa  la  to  apply  it  in  solving  regenerative  stopping  prob- 

Ragenerative  atopplng  problaaa  are  atopplng  problaaa  which  recoaMnca  froa 
tha  initial  atate  upon  atopplng.  They  have  an  Infinite  planning  horlxon  and 
either  averaging  or  diacountlng  must  ba  used.  The  aost  Important  axaaplea  of 
regenerative  atopplng  problaaa  coat  froa  tha  literature  on  maintenance  aodala, 
and  a  comprehensive  description  of  maintenance  aodala  la  available  In  tba  sur¬ 
vey  paper  of  Plerskalla  and  Voelker  [16].  The  lapetue  for  this  study  came  froa 
Kaplan's  nodal  [14]  of  the  optlaal  Investigation  of  a  production  systea.  There 
tba  problaa  la  to  decide,  based  on  reported  aonthly  operating  coata,  whan  man¬ 
agement  should  Investigate  and  correct  If  necessary  (atop).  Once  correction 
takas  place  tha  problaa  recoMsnces  froa  the  Initial  atate.  to  [5]  Buckaan  and 
Killer  solve  the  Kaplan  aodel  as  a  discounted  regenerative  atopplng  problaa  and 
also  obtain  eoae  ganaral  results  for  discounted  regenerative  stopping  problaaa. 

Regenerative  stopping  problaaa  ware  first  studied  Independently  by  Brender 
[4]  and  Brelaan  [3].  Brelaan  called  tbaa  binary  decision  renewal  problaaa. 

Both  authors  proved  that  reB*a*ratlve  atopplng  problaaa  could  ba  solved  by 
solving  an  appropriate  atopplng  problaa  which  wa  will  call  a  X -a topping  problaa 


where  X  bu  ch«  interpretation  as  the  average  coat  par  parlod.  Tha  A-atopping 
problem  la  defined  by  changing  tha  coat  associated  with  a  continua  action  by 
subtracting  tha  amount  X  from  it.  Lat  V(X)  ha  tha  expected  coat  of  tha  X-Stop- 
plng  problea  uaing  an  optlaal  policy  and  atarting  fron  tha  initial  atata.  The 
thaoraa  of  Brendar  and  Braiaan  is  that  if  X*  aatiafiaa  V(X*)  •  0,  than  tha 
right  X  hae  bean  uaad  and  the  optlaal  dec la ion  rule  for  the  A*-stopping  problea 
la  alao  the  optlaal  declalon  rule  for  the  regenerative  stopping  problea.  For 
axaaple,  Taylor  (21,  Section  4)  uses  this  thaoraa  to  eolve  an  optlaal  replace- 
aant  under  ciaulative  daaagaa  problea  by  daterainlng  X*  in  closed  fora.  In 
Ill]  Feldnan  has  reconsidered  a  wore  general  version  of  Taylor'a  problea  and 
solved  it  by  a  different  aethod.  In  problaas  where  X*  cannot  be  determined  in 
cloaad  fora,  an  alternative  would  be  to  solve  regenerative  stopping  problaas 
by  solving  a  sequence  of  A-stopplng  problaas  ending  with  the  A*-stopplng  prob- 
laa,  but  neither  Brendar  nor  Breiaan  considered  this  approach.  This  approach 
aaaaa  quite  proalslng  if  the  aonotone  stopping  thaoraa  can  be  applied  to  each 
A-stopplng  problea. 

In  Section  3  the  thaoraa  of  Breeder  and  Braiaan  la  generalised  by  allowing 
a  finite  nuaber  of  continue  actions  and  letting  the  tlaa  spent  in  each  atate  be 
a  randoa  variable  so  that  the  problaa  la  seal -Markov.  Three  further  results 
are  obtained  which  lead  to  an  algor ltha  for  solving  regenerative  seal -Markov 
stopping  problaas  by  solving  a  sequence  of  A-atopplng  problems. 

In  tha  last  part  of  tha  paper  tha  raplacamant-stocfcage  modal  of  Derman  and 
Lleberman  (9]  and  tha  maintenance  modal  with  uncertain  information  of  Kosenfleld 
(18]  are  solved  by  this  algorithm,  and  tha  approach  seams  quite  efficient  and 
flexible.  However  tha  algorithm  has  not  bean  tasted  by  solving  large  scale 
problems. 
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II •  Optimal  Stopping 

Our  approach  la  to  follow  tha  formulation  of  loaa  120]  and  daacrlba  the 
optimal  stopping  problem  aa  a  Markov  dacielon  problem  with  a  countable  number 
of  atatee  0,  1,...  where  state  0  la  tha  Initial  atate.  Our  formulation  la 
discrete-time  and  the  process  la  observed  at  time  points  t  •  0,  1,  2, 

When  the  system  Is  observed  In  state  1  at  time  t  we  choose  from  a  finite  act 
of  continue  actions  or  decide  to  atop.  If  actions  c  A£  la  chosen,  we  receive 
a  cost  of  C(l,a)  and  the  process  goes  and  to  next  state  at  time  t  ♦  1  according 
to  the  probabilities  P^U).  If  we  stop  we  receive  the  cost  C(l,a)  and  go  to 
state  A  where  we  stay  forever  and  C(A)  •  0  each  period.  The  artificial  state 
A  Is  a  notatlonal  convenience  which  allows  us  to  let  the  planning  horizon  to  be 
Infinite.  An  admissible  policy  «  Is  a  decision  rule  which  assigns  to  each 
state  1  and  period  t  an  action  ir(l,t)  c  A^fe},  where  s  Is  the  stop  decision. 

Our  objective  Is  to  find  a  decision  rule  which  minimises  the  expected  cost  up 
to  and  Including  stopping  where  the  Initial  state  Is  C. 

In  order  that  our  objective  function  be  well-defined  and  that  the  mono¬ 
tone  stopping  theorem  applies  to  our  problem,  we  need  some  additional  restric¬ 
tions  on  the  cost  structure.  We  will  assume  that  there  Is  a  scalar  M  and  a  set 
S  containing  A  which  satisfies  tha  three  assumptions  below.  Often  Sc,  the  com¬ 
plement  of  S,  will  be  a  finite  act  of  states  and  It  may  be  empty.  If  Sc  la  not 
empty  It  will  contain  the  Initial  state  0.  Loosely  speaking,  the  system 
starts  In  Sc  and  eventually  reaches  S,  the  "well-behaved"  set  (Assumption 
3111  below) . 

Assumption  1,  Either  (1)  or  (11)  holds.  Condition  (1)  Is  that  |C(l,s)|  £  M 
for  all  states  1.  Condition  (11)  la  that  for  all  states  1,  C(l,a)  >_  -  M, 
and  P  (a)  >  0  Implies  that  C(J,a)  >  C(l.e).  Furthermore  If  1  c  Sc  than 


0 
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C(l,a)  <  M,  and  F  (a)  >  0  iapliaa  C(J,a)  <  M. 


Inf  (aln  C(l,a))  >-  M,  and  aup  (Bax  C(l,a))  <  M. 


1  C  a  c  A. 


1  t  S‘  a  e  A, 


AaauBptlon  31.  There  are  nuabara  N  and  6  >  0  auch  chat 


1  t  Sc  j  t  S 


"‘  A 


where  la  tha  probability  of  going  froa  atate  1  to  atate  J  In  N  perloda 
and  dependa  on  tha  declalon  rule.  We  require  that  tha  above  Inequality  holda 
for  all  declalon  rulaa. 

(311).  The  act  S  la  cloaed.  By  thla  we  Bean  that  If  1  c  S  and  J  E  SC 
then  P^(a)  *  0  for  all  a  e  A^.  The  atop  action  also  aatlaflea  (11)  alnce 

A  t  SC. 

(3111).  For  aoae  e  >  0,  C(l,a)  >  c  for  all  1  £  S\(A)  and  a  c  A^. 

The  Aaauaptlona  1  and  2  laply  that  tha  coata  are  bounded  above  and  below 
for  atatea  In  Sc,  and  Aaauaptlon  31  aays  that  there  la  an  N-atage  contraction 
on  the  probability  of  ataylng  In  SC.  Aaauaptlon  311  aaauaea  that  when  the  aya- 


tea  reachea  the  aet  S  It  will  atay  In  S.  Aaauaptlon  3111  atatea  that  there  la 
a  atrlctly  poaltlve  coat  of  continuing  when  the  ayatea  la  In  the  aet  S. 

Let  Z(  and  a(  be  the  atate  and  action  at  tlaa  t.  Then  the  expected  return 
etartlng  froa  atate  1  and  ualng  the  policy  w  la 


G„(i>  -  Ew(ll. 


( C  c<*t»*t 

\  t-0  '  l 


>1*0-  4 )>• 


Thla  expreaalon  la  well-defined  becauae  of 


1.  Lat  x~  • 


(0,-x).  Then  for  all  po 11c lea  «, 


*w(11b^2^  C«t,at)‘)|20  •  1)J)  <  (!■*/«)  ♦ 


I 


Proof.  Th«  Iubi  follow*  froa  Che  face  chat  the  continue  coat  per  period  la 
only  negative  when  the  aye tea  la  In  Sc,  and  the  expected  tlae  In  Sc  1*  leaa 
than  or  equal  to  H/d  for  any  policy.  To  Mi/d  we  add  M  alnce  -  M  la  a  lower 
bound  on  the  coat  of  etopplng.  Q.E.D. 

Wa  let  C(l)  -  Inf  0^(1)  for  all  lt  ao  that  C  la  the  optlaal  return  func¬ 
tion.  We  will  prove  (Leaaaa  2  and  3)  that  C  aatlaflea  the  equation  of  optl- 
aallty 

C(l)  •  Bin  ( C(l,a),  Bin  (C(l,a)  +  J]?..(a)C(j))|.  (1) 

\  MAj  j  / 

Our  formulation  dlffera  froa  Hoaa  (20]  In  two  aajor  way*.  Unlike  Rose 
we  permit  a  finite  nuaber  of  continue  action*.  The  generalization  to  a  finite 
number  of  continue  action*  doea  not  complicate  the  derivation  of  the  aonotone 
etopplng  theorem  which  give*  a  condition  for  etopplng  to  be  optlaal.  It  doea 
repreaent  a  aajor  coapllcatlon  for  obtaining  the  optlaal  policy  alnce  we  auat 
determine  which  continue  action  to  uae  for  atatea  where  etopplng  la  not  optl¬ 
aal.  Thla  coapllcatlon  la  addreaaed  In  the  laat  aactlon  where  apeclflc  aodela 
are  aolved.  It  eeeaa  poaelble  to  further  generalize  the  action  apace  ualng, 
for  exaaple,  the  aethoda  of  Fox  (12]. 

The  other  difference  between  our  formulation*  and  that  of  Roaa  la  that  he 
haa  more  reatrlctlona  on  the  coat  atructure.  He  require*  for  all  atatea  1  that 
(a)  0  >  C(i,a)  £  -  M  and  (b)  for  aoae  c  >  0,  C(i,a)  _>  c.  Although  on  (20,  p.  135) 


Roaa  doea  not  require  that  C(l,a)  <  0,  hla  proof  of  the  monotone  atopplng  theorem 
require*  nonpoaltlve  atopplng  coat*. 

The  general  atataaent  of  the  optimal  atopplng  problem  la  given  In  Chapter  3 
of  Chow,  Robblna,  and  Slegaund  ((].  They  aaataae  that  a  aequence  of  random  vari- 
ablea  Yj.Yj,...  hawing  a  known  Joint  diatrlbution  are  obaerwed.  If  we  atop  at 


Che  nch  edge  after  having  observed  y,,...y  ,  Chen  a  cost  of  x  -f(y,,...,y  ) 

in  n  A  n 

Is  Incurred.  The  objective  Is  to  stop  so  as  to  minimize  the  expected  value  of 
the  cost  received  upon  stopping.  If  for  soma  sample  path  we  never  stop,  than 
the  cost  Is  undefined  so  that  this  formulation  requires  that  an  an  admissible 
decision  rule  stop  with  probability  one. 

Beside  the  limitation  of  s  countable  stats  space,  our  Markov  decision  for¬ 
mulation  is  more  restrictive  than  that  of  Chow,  Robbins,  and  Slegmund  [6].  For 
example,  Derman  and  Sacks  [10]  consider  an  aqulpment  replacement  problem  which 
fits  our  Markov  decision  formulation  except  that  their  criterion  Is  to  minimize 
the  expected  cost  up  to  and  including  stopping  divided  by  the  number  of  periods 
until  stopping.  This  cost  structure  can  be  handled  by  the  Chow,  Robbins,  and 
Slegmund  formulation  but  not  by  ours.  In  that  paper  they  also  mention  the  more 
plausible  criterion  of  the  expected  cost  up  to  and  including  stopping  divided  by 
the  expected  number  of  periods  before  stopping  which  Is  a  regenerative  stopping 
problem. 

Returning  to  our  model,  we  want  to  establish  the  monotone  stopping  theorem 
under  Assumptions  1-3.  Our  approach  follows  that  of  Ross  [20].  Although  we 
admit  policies  where  the  expected  time  until  stopping  Is  infinite,  we  begin  by 
observing  that  we  can  eliminate  those  policies  from  further  consideration  since 
their  expected  cost  in  lnflnlts.  This  Is  true  using  Leasts  1,  the  fact  that  the 
expected  time  spent  In  S\{A}  Is  Infinite,  and  Assumption  3111.  In  the  lemmas 
to  follow  we  will  Implicitly  use  the  fact  that  G(&)  •  0. 

Learns  2.  For  JcS/(A), 

(s)  The  optimal  return  function  satisfies  the  equation  of  optimality  (1). 

(b)  The  stationary  policy  which  for  each  state  J  c  S  selects  the  action 
which  minimizes  the  right  hand  aide  of  (1)  Is  optimal. 


» 


(c)  The  optimal  return  function  C  aa t la flea  C(J.»)  >  C(J)  >  -  M. 

Proof,  tha  oat  S  la  cloaad  and  tha  otataa  In  S  aatlafy  Roaa'  raotrlctlons 
°*  tbo  coat  atructuro.  Furthermore,  aa  notad  above  ve  only  need  conalder 
decision  rulaa  ouch  that  the  expected  time  until  atopplng  la  finite.  There¬ 
fore  Roaa'  arguments  120,  p.  135-136]  apply  directly  to  prove  (a)  and  (b)  of 
the  lemma.  The  upper  bound  in  (c)  la  obvloua  and  the  lower  bound  holda  alnce 
the  coot  of  atopplng  la  bounded  below  by  -  M  and  the  coata  of  continuing  are 
nonnegative.  Q.E.D. 

lemma  3.  For  J  e  Sc, 

(a)  and  (b)  of  Leona  2  hold. 

(c)  The  optimal  return  function  C  la  bounded  above  and  below. 

Proof.  Let  u  and  v  be  bounded  functions  on  Sc,  tt  be  a  policy  defined  on  Sc 

and  p(u,v)  -  aup  |u(l)  -  v(i)|.  Let  T*J(u)  be  the  expected  return  in  the 
1  C  Sc  * 

flrat  N  periods  using  the  policy  it  where  u  Is  the  terminal  reward  vector, 
under  the  assumption  that  If  a  atate  J  e  S  la  reached  in  some  period  n, 
n  <  N,  then  the  process  terminates  with  a  reward  G(J).  Therefore  p(T^(u), 
tJ<»»  <  (1-6)  p(u,v)  since  expected  probability  of  leaving  Sc  in  N  periods  is  at 
6  by  Assumption  31.  Also  T^(*)  Is  bounded  If  u  la  bounded  since  both  the 
coats  of  continuing  and  stopping  are  bounded  on  Sc,  and  the  possibility  of 
reaching  S  does  not  destroy  the  boundedness  properly.  The  latter  la  estab¬ 
lished  by  combining  Leaaa  2c  with  the  implication  of  Assumption  111  that  the 
transition  out  of  Sc  will  be  to  a  atata  with  a  bounded  stopping  cost. 

Thus  ve  can  have  the  N-state  contraction  property,  and  (a),  (b),  and  (c)  of 
the  lmma  follow  from  Denardo  (7).  Q.C.D. 

T  The  nonegative  term  C(l,s)  -  C(l)  la  bounded. 

Proof.  If  Assumption  11  holda  the  result  la  lamedlate  alnce  C(l,a)  la  bounded 
above,  and  by  Laamae  2c  and  3c  C(l)  la  hounded  below. 
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It  Aiiuapclon  111  bold*  then  for  1  e  S  C(l,e)  -  C(l)  since  the  continue 

costs  are  nonnegative  and  C(J,s)  >  C(l,s)  If  P*^  >  0.  Por  1  £  Sc  C(l,s)  Is 

bounded  above  by  assumption,  and  by  Lama  3c  C(l)  Is  bounded  below.  Q.E.D. 

Let  Cn  be  the  optimal  return  function  for  an  n  period  problem  where  we 

are  required  to  have  stopped  after  n  periods.  Clearly  C11  >  c"+i  >  G. 

Lema  5.  Por  each  state  1,  11m  Gn(l)  ■  G(l). 

n  ■*  • 

Proof.  By  Le— s  2b  and  3b  there  Is  an  optimal  policy  which  can  be  obtained 

from  (1).  Following  Ross  (20,  Theorem  6.13]  we  let  n  be  this  policy  and  n  be 

n 

the  policy  which  uses  the  same  action  as  n  for  periods  0,1,..., n-1,  but  stops 

in  period  n.  Then  Gn(i)  <  Gn  (1)  and 

n 


c“  (i)  -  C(l) 

n 


£  IC(J.S)  -  G(J )  ]  P(Z  -JJ 

fib  n 


where  ZR  is  the  state  in  period  n.  By  Lemma  4  the  term  in  brackets  is  bounded 
and  since  the  expected  time  until  stopping  starting  from  any  state  i  le 
bounded,  P(Z  -A]  ♦  1  as  n  ♦  •  which  show  that  11m  c”  (1)  »  G(l).  Q.E.D. 

n  -  “  n 

Lemma  5  has  been  called  a  stability  condition  by  Ross  [20]  and  Brelman  [3]. 
Lemmas  2,  3,  and  5  are  sufficient  to  establish  the  monotone  stopping  theorem. 


B  ■  | l:C(i,s)  <  min  (C(l,a)  ♦  £  P*  C(j ,s) ) 
a  c  A.  .  J 


®  Is  precisely  those  states  for  which  stopping  is  at  least  as  good  as  continuing 


on*  *°r*  period  and  then  stopping.  Clearly  Is  1  f  1  ve  continue,  but  not  neces¬ 
sarily  with  the  action  which  minimises  C(l,a)  ♦  P*j£c(J,s).  Rather  than 
stating  the  monotone  stopping  theorem  for  tha  set  B  we  state  it  for  a  subset 


of  D  of  B.  This  can  be  useful  since  a  subset  of  B  may  be  easier  to  Identify 
than  B. 


\ 
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The  Monotone  Condition.  A  set  of  fticu  D  satisfies  the  monotone  condition  If 


it  Is  closed  and  is  a  subset  of  B. 

Theorem  1.  (The  Monotone  Stopping  Theorea) .  If  a  set  D  satisfies  the  monotone 
condition  then  for  1  c  D  the  optimal  decision  la  to  stop. 

T roof.  The  proof  of  Ross  [20,  Theorea  6.14]  applies.  Using  the  definition  of 

B  and  the  fact  that  DCB  Is  closed,  Rosa'  proof  shows  that  C°(l)  •  C(l,s)  for 

all  n  and  1  c  D.  Then  by  Leans  3  G(l)  •  11a  C°(l)  -  C(i,s).  Therefore  the  stop 

n  • 

decision  minimises  the  right  hand  of  aide  of  eq.  (1),  and  by  Leans s  2b  end  3b 
the  stop  decision  is  therefore  optimal.  Q.E.D. 

Ill .  Regenerative  Stopping  Problems 

Regenerative  stopping  problems  are  stopping  problems  which  return  to 
state  0  upon  stopping  end  recoaaence.  We  will  continue  with  the  countable 
state  apace  of  the  previous  section  except  for  eliminating  the  artificial 
state  &.  However  the  transition  tlass  will  be  generalised  so  that  we  have  a 
semi-Markov  formulation.  There  Is  no  standard  semi-Markov  decision  model 
(compare  for  example  Denardo  [8],  Llppaan  [15],  and  Ross  [19])  and  we  will  use 
one  of  the  slapler  versions. 

The  process  begins  at  state  0  at  time  0.  When  the  system  Is  observed  in 

state  1  (inmedlately)  after  the  nth  transition,  we  choose  from  a  finite  set 

of  continue  actions  or  decide  to  atop.  If  the  action  a  c  A^  Is  chosen  the 

process  goes  to  the  next  stste  according  to  the  probabilities  P^(a).  Given 

that  the  next  state  J,  the  time  for  the  transition  to  take  place  Is  a  random 

variable  T. .  .  We  will  require  that  for  eome  6  >  0,  c  >  0,  £  P..U)  F..  (6) 
lja  j.Q 

£  1  -  c  for  all  1  and  a  where  ***•  distribution  function  of  T^#. 

This  condition  aaya  that  for  every  state  1  and  continue  action  a  there  la  a 
positive  probability  of  at  least  c  that  the  transition  time  will  be  greater 
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chan  6  which.  In  turn,  means  that  thara  la  a  atrlctly  poaltlva  bound  on  the 
expected  tine  until  the  eyatea  returna  to  atata  0.  For  any  atata  1,  1  >  0, 
we  may  chooae  to  atop.  Then  P^ts)  “  1  and  the  time  until  reaching  atata  taro 
la  given  by  the  randoa  variable  Tig>  There  are  no  reatrlctlona  on  Tig  except 
when  1  ■  0.  For  atate  0  we  require  that  for  aoae  6>0,  e  >  0,  Prob(T^>  6)  >  e. 
We  require  that  both  E l ^ ^ ]  and  E[Tlg]  be  finite.  The  coata  for  continuing 
and  atopplng  are  given  by  the  nonnegative  randoa  functions  C(l,a,T^g)  and 
C(l,s,Tls)  respectively.  They  are  both  Incurred  at  the  beginning  of  a  transi¬ 
tion.  Following  Llppman  [15]  we  will  let  a  policy  r  be  a  decision  rule  which, 
giver,  the  number  of  the  transition  and  the  atate,  saya  which  action  la  to  be 
chosen.  It  would  be  preferable  to  have  the  decision  rule  depend  on  the  time 
of  the  transition  rather  than  the  nuaber  of  the  transition,  since  In  a  finite 
horizon  seal-Markov  problem  the  optlaal  policy  would  depend  on  the  tlae  remain¬ 
ing  to  the  end  of  the  horizon  but  not  the  nuaber  of  the  transition  (Jewell, 

[13]).  However,  the  additional  complication  of  allowing  the  decision  rule  to 
depend  on  tlae  does  not  seea  Justified  for  our  purposes.  For  a  policy  *  we 
let 

\  -  11a  sup  i  E[Vw(t)] 

t  -*  • 

where  W_(t)  is  the  expected  cost  up  to  and  including  t  using  the  policy  * 
starting  from  state  0,  ao  that  X  represents  tha  average  cost  per  period  using 
the  policy  i*.  The  conditions  on  the  T^g  and  T^  and  the  nonnegatlvety  of  the 
costs  assure  that  X^  la  well-defined  although  possibly  Infinite.  Let  X* 

•  Inf  X^.  Our  objective  la  to  find  a  policy  v*  such  that  X^*  *  X*. 

IT 

We  propose  to  solve  our  semi-Markov  regenerative  atopplng  by  solving  a 
sequence  of  X-stopplng  problems  where  X,  -  •  <  1  <  •,  has  the  interpretation 
ao  the  average  cost  per  period.  A  X-stopplng  problea  is  constructed  from  the 
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data  of  a  semi-Markov  regenerative  atopplng  problea  by  letting  the  P  (a)  and 


the  tine  between  transit lone  reaaln  unchanged,  and  netting 


Thus  ve  have  an  optlaal  atopplng  problea  which  may  or  may  not  aatlafy 


Assumptions  1-3,  and  for  which  the  length  of  tlae  between  transitions  la  a 


randoa  variable.  Clearly  the  optlaal  atopplng  problea  la  unaffected  by  the 


length  of  tlae  between  transltlona  aa  long  aa  they  are  finite  with  probability 


Let  C(1,X)  be  the  optlaal  return  function  of  the  X-atopplng  problea  from 


the  Initial  atate  1.  The  case  where  1  ■  0  It  laportant  enough  that  ve  Introduce 


the  function  V  defined  by  V(X)  -  C(0,X).  Let  A  be  the  act  of  X  such  that  the 


X-stopplng  problea  satisfies  Aasuaptlona  1-3  of  the  previous  section.  The  set 


A  la  a  seal-infinite  interval  since  if  X'  e  A  and  X  £  X'  then  X  c  A  since  only 


Aasuaptlon  3111  will  depend  on  the  choice  of  X  and  It  Is  easier  to  satisfy  the 


Her  the  value  of  X.  Unless  otherwise  stated,  ve  vlll  be  assualng  that 


X  t  A  for  any  X-stopplng  problea  being  considered.  Besides  Illustrating  the 


notation,  the  following  la  an  exaapla  of  a  problea  where  there  Is  no  X  auch 


Exaaple  1.  Consider  a  discrete  tlae  problaa  where  there  la  only  one  continue 


action  for  each  atate,  and  P 


0  with  probability  one)  and 


C(l,s,0)  •  1.  For  state  0  the  atop  action  takas  one  period  at  a  cost  of  10 


By  Inspection  the  optlawl  policy  for  the  regenerative  stopping  problea  la  to 


never  atop  and  the  averaga  coat  per  period  la  1.  If  X  <  1  then  the  act  S  can 


be  the  entire  space  0,1,2...  and  Assueptlons  1-3  holds  for  the  A-stopplng  prob- 
laa.  If  A  >  1,  the  sat  S  Bust  ba  empty  by  Assumption  3111,  but  then  Asaumptltn 
31  cannot  bo  satisfied.  Thersfors  A  ■  (-  *,1). 

The  function  V(A)  for  this  problem  la 

V(A)  ■  2  -  A  ^  <  1  (the  optimal  policy  Is  to  stop  after 

the  transition  from  state  0  to  state  1). 

V(A)  •  -  ®  A  >  1  (the  optimal  policy  Is  to  never  stop). 

Clearly  there  Is  no  A*  such  that  V(A*)  ■  0. 

Let  it  be  a  policy  for  the  A-stopplng  problaa  wl.lch  satisfies  E(T]  <  « 
where  T  Is  the  random  time  until  stopping  using  the  policy  *.  Let  V  (A)  be  the 

IT 

expected  cost  of  the  A-stopplng  problem  starting  from  state  0  using  the  policy 
r.  Then 

V„(A)  -  E[C]  -  AE[T]  (2) 

where  C  Is  the  original  (without  subtracting  the  A  terms)  cost  up  to  and  In¬ 
cluding  stopping  using  the  policy  *.  Thus  E[C]  -  V^(0)  -  0^(0, 0).  Equation 
(2)  simply  decomposes  the  costs  of  a  A-stopplng  problem. 

We  will  now  prove  the  theorem  of  Brender  [A]  and  Brelaan  [3]  In  the  semi- 
Markov  case  where  policies  are  admitted  where  the  expected  time  until  stopping 
Is  Infinite. 

Theorem  2.  Suppose  that  A*  c  A  satisfies  V(A#)  ■  0.  Then  the  policy  which  Is 
optimal  for  the  \#-etopplng  problem  Is  optimal  for  the  regenerative  stopping 
problem.  Also  A*  -  X*,  the  optimal  expected  cost  per  period. 

Proof.  Let  *  be  e  stationary  policy  which  solves  the  A*-stopplng  problem. 

Such  a  policy  exists  by  Lemmas  2  and  3  and  furthermore  E(T  )  <  •  where  T  is 

IT  IT 

the  random  time  until  stopping  using  the  policy  *.  The  expected  cost  up  to 
and  Including  returning  to  state  0  using  the  policy  *  Is  A*  E(T^)  from  (2) 
since  V  (A*)  -  V(A*)  -  0. 

7» 
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Let  Y1  be  the  randoa  coet  in  the  regenerative  stopping  problea  for  the  1th 

return  to  state  0  using  the  policy  x.  We  have  Just  shown  that  EfY^}  •  1*  ElT^]. 

By  Ross  120,  Theorea  3.16]  including  his  subssquent  renarks  11a  ^  (t)]  •  X* 

t  *  m 

Since  X  -  11a  -  E[W  (t)]  we  will  have  established  that  X*  -  X*  once  we  have 

*  t  ♦  •  1 

established  that  it  Is  the  optlaal  policy  for  the  regenerative  stopping  problea. 

Now  let  *'  be  an  arbitrary  admissible  policy  for  the  regenerative  stopping 
problea  vlth  E[T  ,]<«■.  The  expected  cost  up  to  and  including  returning  to 
state  0  using  the  policy  ir'  for  the  regenerative  stopping  problea  is  greater 
than  or  equal  to  X*E[T  ,]  using  V  , (X*)  >  V(X*)  ■  0  and  equation  (2).  The  saae 

IT  Tl  *— 

analysis  as  above  shows  that  11a  7  E[W  , (t)]  >  X*. 

t  ♦  • 

Let  ir  *  be  an  arbitrary  policy  for  the  regenerative  stopping  problea  with 
ElT  ,]••».  We  consider  a  modified  regenerative  stopping  problea  where 

TT 

C(1>*,Tl).>'  ■  C<l'*-TiJ.>  - 

and 

C(i,s,T^#) '  -  C(1,s,T1b)  -  X*Tlg 

The  proof  consists  of  shoving  that  the  average  reward  of  the  modified  regenera¬ 
tive  stopping  problea  Is  greater  than  or  equal  to  zero  and  therefore  that  the 
average  reward  for  the  original  regenerative  stopping  problem  Is  greater  than 
or  equal  to  X*. 

The  expected  cost  up  to  and  Including  returning  to  state  0  for  the  modi¬ 
fied  regenerative  problea  is  V  , (X*)  since  the  modified  regenerstlve  problea 
has  the  saae  costs  ss  the  X*-stopplng  problea.  By  using  Lcama  1  (X*  e  A)  and 
the  fact  that  EfT  ,]•«>,  Ross  [20,  Theorem  3.16]  can  be  easily  modified  to 

TT 

show  that  the  average  reward  of  the  modified  regenerative  stopping  problem  Is 
greater  than  or  equal  to  sero.  Q.E.D. 

The  following  propositions  will  be  used  In  the  algorltbsi  for  solving  re¬ 
generative  stopping  problems.  The  first  establishes  some  useful  properties 
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of  the  V  function.  The  second  given  an  alternative  optlaallty  condition,  and 
the  third  ahova  how  the  aolutlona  Improve  aa  aucceaalve  X-a topping  problems 
aft  aolved. 

Proposition  1.  V:A  -*  X  la  a  decreasing,  and  finite-valued,  and  concave  func¬ 
tion  of  X.  Since  V  la  concave  It  Is  known  that  the  right  and  left  hand  derlva- 
tlvea  exist  everywhere  on  the  Interior  of  A.  Furthermore 

r (X)  >  -  E(TX]  >  v;<X)  (3) 

where  V'  and  are  the  left  and  right  hand  derivatives  of  V  and  T^  is  the 
stopping  tlae  of  a  X-optlaal  policy. 

Proof.  It  is  clear  that  V  is  decreasing  In  X.  Furthermore  V  >  -  •  for  X  c  A 
since  Leans  1  applies.  To  show  that  V  la  concave  consider  points  X.aX^  + 
(l-a)X2>  and  X2  where  0  <  a  <  1.  Let  v  be  an  optimal  stationary  policy  for 
the  oXj  ♦  ( 1— X)  X7-stopping  problem.  If  that  sane  policy  u  is  used  for  the 
X^  and  X^  stopping  problem  then  froa  (2) 

VCXjd  ♦  (l-a)X2)  -  V7T(oX1  +  (l-a)X2)  -  aV^fA^  ♦  (1-a)  V^fAj). 

However 

V  (X.)  >  V(X,)  and  V  (X_ )  >  V(X7)  which  shows  the  concavity  of  V. 

The  inequalities  on  the  right  and  left  hand  derivatives  are  established 
by  using  a  similar  approach.  Let  it  be  the  optimal  policy  for  the  X-stopping 
problem.  Then  for  c  >  0, 

V(X-K)  -  V(X)  <  V(X+€)  -  V  (X)  -  -  C  E[T  ] 

*“  TT  IT  W 

where  T  la  stopping  tine  of  the  policy  it.  Letting  c  go  to  aero  establishes 
the  result  for  V^(A).  The  proof  for  VUA)  la  similar.  Q.E.D. 

Proposition  2.  If  *  la  optimal  for  both  a  A^-atopplng  problem  and  a  X^-atop- 
ping  problem,  where  V(Xj)  2.  0  V(X2)  <  0,  <  Xj*  *nd  X^,  X^  c  A,  then  it  la 

optimal  for  the  regenerative  stopping  problem. 
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Proof.  From  (2)  V^(X)  •  V^(X2)  -  (X-Xj)  E(TW) .  By  Proposition  1,  V(*)  Is  con- 
csvs,  V’(X.)  >  -  E(T  ) ,  snd  V*(X_)  <  -  E(T  ) .  Therefore  V'(X)  -  -  E(TJ  •  V*(X) 
for  X^  <  X  <  Xj.  This  togsthsr  with  slthsr  V^(X^)  •  V(X^)  or  Vff(Xj)  •  V(X^) 

Implies  thst  VT(X)  •  V(X)  for  X^  <_ X  <_ X^.  Furthermore  sines  V(X1>  >  0  snd  V(X2>  0, 

It  Is  clssr  thst  V(X)  -  0  for  none  X  bstvssn  X^  snd  Xj  snd  thst  *  Is  optimal  for 
tha  rsgsnsrstlvs  stopping  problem  by  Theorem  2.  Q.E.O. 

Proposition  3.  If  V(X2>  >  V(X^)  >  0  ■  V(X*)  than  the  X^-optlmal  policy  is  at 
leeet  as  good  as  the  X^-optlasl  policy  with  respect  to  the  regenerative  stopping 
problem.  Likewise  If  V(X2)  <  V(Xj)  <  0  -  V(X*)  the  X^optlmal  policy  Is  at 
leeet  ••  good  as  the  X^-optlnal  policy  with  respect  to  the  regenerative  stopping 
problem. 

Proof.  We  only  prove  the  proposition  for  the  case  V(X2>  >  V(X^)  >  0  since  the 
other  proof  Is  similar.  Let  itj  be  the  X^optlmal  policy  and  *2  be  the  X^optlmal 
policy. 

First  we  establish  the  Inequality  E[T2J  <  E{T^]  where  the  subscript  1 
refers  to  the  policy  and  2  refers  to  Assume  the  contrary.  Then 

(X2-X2)  E[T2]  <  (Xj-Xj)  EfTj  since  X2  <  X^  Also  E[C2J  -  XjElTj]  <  Elt^]  - 
X2E[Tj).  Adding  these  two  Inequalities  Implies  that  la  strictly  better  than 
for  the  X^-atopplng  problem,  a  contradiction. 

Returning  to  the  main  argument, 

EIC2]  -  Xl  E(T2]  >  EfCjJ  -  X1  EfT^J  -  VfX^  >  0 

If  we  divide  both  sides  by  E{T2J  and  EtT^J  respectively,  the  Inequality  is 
maintained  and 

MC2)  EIC  ) 

fit^  -  xi  i  fir^r  •  Y 

Prom  example  1  we  sea  thst  for  some  problems  there  Is  no  X  such  that 
V(X)  •  0.  In  that  example  the  optimal  policy  wee  to  never  stop.  Proposition  1 


implies  that  the  existence  of  a  X  c  A  auch  that  V(X)  <  0  la  a  aufflclant  con¬ 
dition  that  thara  la  a  X*  t  A  auch  that  V(X*)  •  0.  Tha  existence  of  auch  a  \ 
la  oftan  aa ay  to  verify.  If  we  bava  auch  a  X  than  tha  algorithm  below  will 
only  conalder  X  <  X  and  which  tharafora  balong  to  A. 

Thi  Regenerative  Stopping  Alxorlthm. 

Step  0. A.  Find  a  XQ  which  la  laaa  than  X*,  tha  optimal  average  coat  of 
tha  regenerative  atopplng  problem*.  Tha  aodela  of  tha  next  aactlon  will  pro¬ 
vide  exaaplea  of  how  thla  Xq  can  be  found.  It  la  dealrable  that  Xq  ba  aa  large 
aa  possible.  We  solve  tha  X^-atopplng  problaa  and  let  *  be  tha  optlaal  policy 
for  that  problem.  Since  Xq  <_  X*.  V(Xq)  _>  0.  If  a  mistake  la  made  and  Xq  la 
greater  than  X*.  then  V(Xq)  <  0  and  Xq  can  ba  changed  until  V(Xq)  _>  0. 

Step  O.B.  Sat  X.  "  min  (X  ,X)  where  ir  la  the  optimal  policy  of  the 

•  -  ^  r 

XQ-stopplng  problem.  We  solve  the  X^-stopplng  problem.  Since  X^  >  X*, 

V(XX)  <_  0.  We  check  if  Theorem  2  or  Proposition  2  Is  satisfied.  If  not  we 
continue  to  Step  1. 

Step  1.  We  are  now  in  the  general  caae  where  we  have  solved  a  XQ-stopplng 
problem  and  a  X^-stopplng  problem  where  Xq  £  X*  and  X^  _>  X*.  The  new  X-stop- 
plng  problem  to  be  solved  la  given  by  Xnav  -  min  (X,XW)  where  *  la  the  beat 
(lowest  average  cost)  policy  determined  to  date,  and  X  •  oX^  ♦  (1-a)  where 
0  <  a  <  1.  The  subecrlpt  B  stende  for  blaectloa  end  the  subacrlpt  A  stands 
for  approximation.  Computational  experiences  suggest a  choosing  a  low  value  of 
a,  since  the  approximation  la  quite  accurate.  We  have  X^  ■  1/2  Xq  +  1/2  X^. 

XA  la  the  X  auch  that  ?A(X)  *  0,  where  V^d)  la  based  on  the  four  aquations: 

VV  -  V(X0>  •  W  *  -  IITX0J  • 


va(Ai>  “  V(X1) 


•nd  y^Xj) 


-  IlT.  ]  . 
A1 


(A) 


It 


equations  dtttnlu  the  coefficients  of  the  cubic  approximation  V  (X)  ■ 

2  3 

*0  *  Bl*  +  *2*  +  *3*  *  Th*  derlvetlve  conditions  ere  based  on  Proposition  1. 

A^-stopplng  problea  Is  solved  end  l0**  replaces  Xj  If  V(XMW)  <  0  and  re- 
placea  XQ  if  V(X°*W)  >  0.  We  check  If  Tbeorea  2  or  Proposition  2  la  satlafled. 
If  not  we  return  to  Step  1. 


Connects .  A  value  of  a  >  0  In  Step  1  aaaures  that  the  "interval  of  uncer¬ 
tainty"  goes  to  sero.  Besides  assuring  that  the  "Interval  of  uncertainty"  goes 
to  rero  there  Is  a  rationale  for  a  positive  a  even  if  the  cubic  approximation 
la  excellent.  Suppose  that  o  •  0  and  la  close  to  either  XQ  or  Xl  say  A.. 
Then  we  would  prefer  that  V(XA)  >  0  so  that  the  next  Interval  of  uncertainty 
Is  (XA,X1)  rather  than  (AQ,XA).  If  a  >  0  then  Xn**  <  XA  and  Vf!0**)  >  V(XA> 
so  that  V(A  )  Is  "aore  likely"  than  V(XA)  to  be  greater  than  or  equal  to 
sero.  When  XA  la  roughly  between  X^  and  X^  the  choice  of  o  Is  not  Important. 

If  we  stop  before  optimality  Proposition  3  says  that  either  the  current 
X0-optlmal  policy  be  the  current  X^optlmal  policy  will  be  the  best  policy  de¬ 
termined  to  date,  and  that  earlier  efforts  can  be  forgotten.  The  higher  of 
the  average  returns  of  these  two  policies  can  be  compared  with  the  current 

lower  bound  on  X*,  X- . 

0 

X®  alternative  to  Step  1  would  be  the  policy  Iteration  approach  where 

.  new 

*  •  where  w  Is  the  most  recently  considered  policy.  In  this  case  the 

sequence  of  X  would  be  decreasing  to  X*. 

Finally,  let  ir  be  the  optimal  policy  of  a  X-stopplng  problem.  It  la  In¬ 
teresting  to  observe,  using  the  notation  of  (2),  that 


whan  V* (X)  exists,  the  last  equality  by  Proposition  1.  Therefore  X^  equals 
the  point  where  a  supporting  hype rp lame  at  ?(A)  to  the  concave  function  V 
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would  equal  0.  Thia  la  reminiscent  of  Puteraan  and  Branellc's  result  1173 
relating  Newton's  Method  end  policy  Iteration  In  the  finite  state  finite 
action  Howard  Model. 

IV.  A  KeplaceMant-Stockage  Model 

Damn  and  Liaberwan  (9]  consider  a  Machine  which  requires  one  transistor. 
When  a  new  transistor  Is  Installed  there  Is  a  probability  f^  that  it  will  per- 
fon  at  service  level  s,  s  •  1,2,3...  .  After  each  period  the  service  level 
either  stays  at  the  saee  level  or  the  tranaletor  falls  with  probability  p,  in¬ 
dependent  of  the  length  of  service.  When  the  transistor  Is  In  service,  at  the 
end  of  the  period  one  nay  either  leave  It  in  service  or  renove  It  If  the  ser¬ 
vice  level  is  unsatisfactory.  Spare  transistors  sre  kept  in  a  bln  according 
to  the  rule  that,  when  empty ,  the  bln  Is  restocked  with  N  new  transistors  and 
the  Machine  la  shut  down  for  one  period.  The  objective  Is  to  Bin ini re  expected 
average  cost  per  period  over  an  Infinite  horlxon.  The  problea  Is  to  detemlne 
a  restocking  level  N*  and  a  rule  for  replacing  a  transistor  In  service  which 
neat  this  objective.  For  alapllclty  we  will  assune  a  less  general  cost  struc¬ 
ture  than  they  did,  but  ona  vhlch  does  Include  their  cxanple  problea.  We  assume 
an  ordering  cost  of  K  ♦  cN  when  N  ^  1  transistors  are  ordered  where  the  constant 
K  Includes  the  cost  of  operating  with  taro  transistors  during  the  one  period  It 
takas  for  the  order  to  be  received.  The  operating  cost  per  period  is  hn  •+  w^ 
where  a  la  the  level  of  service  and  n  la  the  nuaber  of  translators  available, 

0  £  0  1  **•  *nd  h  Is  a  positive  holding  cost.  Va  also  want  w#  nonnegative  and 
Increasing  In  s. 

Daman  and  Llebeman  [9]  formulate  the  problea  as  a  countable  state  Markov 
decision  problea  with  states  (l,s)  1  >  1,  •  >  1,  and  a  stats  0.  Whan  the  sys- 
taa  la  in  atata  (l,s)  they  Mean  that  1  units  of  stock  are  on  hand  of  which  one 
la  installed  at  operating  level  a.  The  possible  decisions  In  state  (l.s)  are 
to  replace  the  unit  in  service  at  the  end  of  the  period  or  not  to  replace  the 
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unit  In  service.  AC  state  0  they  decide  bow  aany  units  to  order.  On  page  615 
of  {9}  they  outline  their  algorithm  whose  overall  plan  Is  to  decoapose  the 
replacing  and  the  ordering  decisions.  First  they  determine  an  upper  bound  fi 
for  H,  and  for  N  •  1,...,  5,  calculate  by  policy  Iteration  the  optimal  policy 
for  the  problem  of  determining  for  which  service  levels  the  transistors  should 
be  replaced  with  a  new  one.  Derman  and  Llebeman  develop  several  tests  to 
speed  up  the  calculations.  Ball  (2)  reconsiders  the  problem  and  applies  the 
monotone  stopping  theorem  to  obtain  some  new  tests  to  speed  up  the  calcula¬ 
tions.  but  the  overall  approach  Is  that  of  Derman  and  Lleberman. 

We  will  reformulate  the  problem  by  simplifying  the  state  space  at  the 
price  of  enlarging  the  action  space.  Ue  let  the  state  apace  be  the  Integers 
1.  1  *  1.  where  state  1  means  that  1  units  are  on  hand  Including  any  In  ser¬ 
vice.  In  this  formulation  1  Is  the  lnltlel  state.  For  any  state  1  there  are 
a  countable  number  of  continue  actions  a  •  1,2,3,...,  •  where  taking  action  a 
means  that  the  Installed  unit  will  be  replaced  If  it  Is  operating  at  level  a 
or  worse.  The  expected  coat  for  state  1  and  action  a  Is 

C(l.a)  -  £  (hl-tw  )  f  £  (hl+v  )  f  .  (5) 

s<a  •  •  P  .>«  ■  • 

The  expected  length  of  time  In  state  1  is  £  f  —  +  £  f.  For  convenience 

s<a  *  **  e>a  * 

we  have  assumed  a  highly  plausible  fora  of  the  replacement  rule.  This  assump¬ 
tion  can  be  justified  from  equation  (6)  which  follows.  Because  of  the  sla^l* 
way  In  which  the  coots  and  expected  transition  time  vary  with  the  action  a,  the 
countable  number  of  decisions  does  not  causa  a  computational  difficulty.  Since 
the  best  action  can  always  bs  determined  there  Is  no  theoretical  complication 
from  going  from  a  finite  number  of  continue  actions  to  an  infinite  niaber. 

Ve  make  a  second  major  alteration  in  the  formulation  by  »g  that 

F*j  •  1  If  J  •  1  ♦  1  rather  than  for  j  •  1  -  1.  Thus  if  M  is  the  reorder  level 
we  will  perceive  the  stock  level  as  going  from  the  states  1  to  2, . .to  N  to  N  +  1 


» 
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to  1,  rather  than  as  they  do  physically  from  N  to  N-l 


the  averaging  caaa  thla  reordering  will  lead  to  the  saae  average  coat  per 


period.  The  purpose  in  this  reordering  Is  to  allow  a  atop  decialon  with  each 


state  1,  1  >  2,  end  eo  that  for  the  A-stopplng  problems  the  sat  B  will  satisfy 


the  monotone  condition  and  The o rest  1  can  be  used.  The  cost  of  etopplng 


C(l,a),  Is  K  +  c(l-l),  It  takes  one  period,  and  the  systea  returns  to  state  1 


We  multiply  c  by  1  -  1  since  If  we  scop  at  atate  1  the  reorder  level  Is  1  -  1 


For  each  A-stopplng  problem  the  smallest  state  where  we  stop,  and  hence  the  re 


corder  level,  will  be  determined  by  applying  the  monotone  stopping  theorem 


To  see  that  this  reformulation  Is  justified  consider  the  policy  which  in 


the  Denaan  and  Lleberman  formulation  orders  two  units  in  atate  0,  replaces  in 


state  (l.s)  if  s  3^  3,  and  replaces  in  atate  (2,a)  if  a  £  2.  The  expected  cost 

per  cycle  is  K  +  2c  +  £  (Mv  )  f  ;  +  T  (h+v)  f  +  (2h+w  )  f  ^  ♦ 

77%  •  •  P  -T,  ••  lip 


£  (2h-*w  )  f  and  the  expected  length  of  the  cycle  la  1  +  f 


In  our  formulation  that  policy  Is:  choose  action  3  when  In  state 


‘  '  s>2  ’ 

1,  action  2  when  in  state  2 


to  have  the  same  expected  cost  end  same  expected  length  per  cycle  aa  those 


In  order  to  apply  the  Regenerative  Stopping  Algorithm,  we  first  check 


Assumptions  1-3  for  the  A-stopplng  problem  In  order  to  determine  A.  For  any  A 


let  S  -  { 1 : hi  >  A).  Assumptions  31,  311,  3111  are  satisfied.  Assumption  2  Is 


satisfied.  Assumption  11  la  not  satisfied  but  Assumption  111  Is  satisfied. 

Thus  A  •  (-  •,♦  •)  end  clearly  there  la  a  large  \  t  A  such  that  V(X)  <  0.  Next 

we  look  at  the  set  B  •  (i:C(i,e)  <  (min  C(l,a)  +  C(l+l,s)),  1  >  2}  - 

~a  c  A^ 

{1:0  <  min  C(l,a)  ♦  c,  1  >  2)  where  C(l,a)  la  the  cost  of  stopping  for  the 
a  c  A. 


I 


I 


I 


; 


o 


o 


^-stopping  problM  and  equals  K  +  c(l-l)  -  A.  The  C(l,a)  are  the  continue 

coata  for  the  A-stopplng  problea  and  equal  V  (hl+v  -A)  f  —  ♦  Y*  (hi+v  -A)  f  . 

•<t  '  ,p  i>e  1  ' 

Since  C(l,a)  la  Increasing  In  1  because  of  the  holding  costa,  B  la  of  the  fora 
{1:1>1*)  for  sons  Integer  1*  which  la  a  dosed  set  and  Theorea  1  can  be  ap¬ 
plied.  It  turns  out  that  the  beat  continue  decision  la  easily  determined  for 
the  A-stopplng  problea  since  *  1  regardless  of  the  choice  of  a.  For  any 

state  1  the  best  continue  action,  a*(l),  la 

a*(l)  •  Inf  {a:hl+w#>A}.  (6) 

It  Is  precisely  (6)  that  Bell  [2]  exploits  In  his  approach  to  this  problea.  We 
are  ready  to  apply  the  algorltha  once  we  have  determined  A^,  the  lower  bound  on 
X*.  This  lower  bound  Is  obtained  by  assuming  that  f^  •  1,  that  we  always  go  to 
the  aost  favorable  operating  state.  This  assumption  eliminates  the  replaceaent 
decision  and 

N  . 

A  -  aln  {(1C  ♦  £  [c  ♦  (w  .♦hi)  £]) /(1-HJ/p)  > .  (7) 

0  H>1  1-1 

Example  2.  The  first  problea  we  consider  Is  one  that  Is  presented  In  both  Deraan 
and  Lleberman  (9)  and  Bell  [2].  The  data  are  p  •  .1  f  ■  (1/2)*;  s  -  1,2,3,...  . 
h  •  4,  w  •  100  (1.4  -  (.2/2*"1))  -  4,  K  -  140,  and  c  -  20.  In  this  exaaple 
we  will  let  a,  the  weighting  factor  of  the  Regenerative  Stopping  Algorltha  be 
.1. 

First  eq.  (7)  is  solved  and  AQ  •  123.63  with  the  minimising  N  •  1.  Then 
the  A0-stopplng  problea  la  solved.  For  state  1  a*(l)  “  2  using  (6)  since 
4  ♦  116  <  123.63  but  4  ♦  126  >  123.63.  The  value  of  C(l,2)-  (4+116  -  123.63) 

<1/2)(10)  ♦  T  (4+w  -123.63)  f  •  -  13.29.  For  state  2  a*(2)  -  1  and  C(2,l)>  0 

a>2  *  1 

so  that  2  c  B.  Since  we  atop  at  stats  2  the  reorder  level  Is  1.  8(2,123.63) 

-  160  -  123.63  -  36.37,  and  0(1,123.63)  -  -  13.29  ♦  36.37  -  23.08  -  V(123.63). 


21 


We  do  not  determine  •  value  for  and  ■  123.63  ♦  (23.06/6. 5)  -  127.18,  the 
fterage  coat  per  period  of  the  above  policy  ».  When  ve  aolve  the  127.18-stopplng 
problem,  the  policy  n  la  again  optimal  and  V(127.18)  ■  0  ao  that  tt  la  optimal  by 
Theorem  2. 

In  order  to  gain  more  experience  vlth  the  algorithm  the  following  problems 
were  solved:  As  before  p  •  .1,  f^  -  (1/2)*,  a  •  1,2,3,...  K  -  140,  c  -  20,  but 
h  •  .1  and  w^  ■  H(l-(15/16)*  *)  -  .1  where  H  la  a  scalar.  These  problems  had 
optimal  order  quantities  N*  of  between  10  and  20.  Recall  that  the  algorithms 
of  Bell  [2]  and  Derman  and  Lleberman  [9]  must  solve  N*  policy  iteration  prob- 


Dlfferent  runs  were  made  by  using  different  values  of  the  weighting  factor 
a  and  the  parameter  H.  The  computational  rasulta  In  terms  of  the  number  of 
X-etopplng  problems  solved  were 

a  •  0.0  a  •  .02  a  ■  .10  a  "  .25 


H 

H 

H 

H 


4.6 

4.8 

5.0 

5.2 


4 

5 
5 
5 


5 

5 

5 

5 


6 

5 

5 

5 


6 

6 

6 

5 


V.  Markov  Deterioration  with  Uncertain  Information 

Rosenfleld  118]  considers  a  maintenance  problem  where  the  underlying  Markov 
procgBB  has  actual  states  0,1,..., H,  where  0  Is  the  best  state  and  N  la  the 
worst  state.  The  actual  state  la  not  known  except  at  certain  times  and  the  ob¬ 
served  state  Is  (l,k)  which  means  that  k  periods  ago  the  system  was  observed  in 
state  1,  where  0  <  1  <_  H,  and  k  >  0.  Bach  period  the  actual  state  of  the  system 
changes  according  to  a  Markov  transition  matrix  P.  Thus  if  the  process  Is  in 
state  (l,k)  the  probability  the  actual  state  la  j  la  F^,  the  lj  element  of  the 
matrix  P  to  the  kth  power. 


22 


Each  period  there  ere  three  eveileble  decleloos  for  state  (i,k),  repair, 

no  action,  or  inapt -t.  If  the  repair  daclaion  ia  aelected,  the  coat  la  R  end 

Cha  eve tea  novas  to  atate  (0,0)  the  nest  parlod.  With  the  decision  no  action 

N 

tha  expected  coat  la  £  P*  L.  and  the  system  noves  to  state  (l,k+i)  the  next 

j-0  J 

period,  where  is  interpreted  as  the  one-period  operating  cost  when  the  ma¬ 
chine  is  In  atate  j.  The  expected  cost  associated  with  the  Inspect  declalon 
N  k 

is  M+  £  P.  L  where  M  Is  the  cost  of  inspection,  and  the  systems  moves  to 
1-0  ' 

J  k+l 

state  (j.O)  with  probability  P*,  . 


This  is  the  Rosenfield  model  except  chat  we  have  not  permitted  the  cost  of 

repair  to  depend  on  the  actual  state.  Like  Rosenfield  we  will  require  that  the 

matrix  P  satisfy  P  -  0  if  j  <  i,  P  <  1  except  for  J  -  N,  and  £  P..  is  non- 

ij  j.k 

decreasing  in  i  for  k  -  1,2 . N.  Rosenfield  cites  results  which  show  that  if 

P  satisfies  the  above  conditions  then  so  will  P*1.  We  will  assume  that  the 

are  increasing  in  j.  Prom  Rosenfield  [18, Lemma  1]  we  have  the  results  that  for 

N  . 

any  increasing  function  W  ,  0  <  J  <  N,  £  P  W  is  increasing  in  1  and  in  k 

i  ~  ~  j-0  lj  J 

when  P  satisfies  the  above  conditions. 

As  with  the  replacement  and  atockage  model  of  the  previous  section  we  will 
reformulate  the  problem  and  simplify  the  state  apace  by  enlarging  the  action 
space.  In  our  formulation  the  states  will  be  the  actual  states  0,1,..., N.  For 
each  state  1  there  will  be  both  a  countable  number  of  continue  actions  and  a 
countable  number  of  stopping  actions.  The  continue  actions  are  of  the  form 
a-lnspect,  a  0,  and  have  the  interpretation,  "inspect  after  a  periods  have 
passed."  The  decision  takes  a  ♦  1  parloda  and  the  aaaoclated  coat  when  in  atate 


i  is  (L.  ♦  J)p41l.  4  ...  -f  £P*  L.  ♦  M).  For  a  -  0  this  coat  la  L  ♦  M.  The 
i  ,  lj  J  ,  ‘J  J  * 

J  J 

system  than  moves  to  atate  J  with  probability  .  The  stopping  actions  are 
of  the  form  a-rapalr,  a  >  0,  and  have  the  interpretation,  "repair  after  a 
parloda  have  passed."  This  decision  takas  a  ♦  1  periods  and  tha  associated  cost 
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when  in  ititt 


. 


For  a  ■  0  this  cost  Is 


R,  snd  for  s  •  1  this  cost  Is  +  R.  Ths  system  then  returns  to  stste  0.  More 
than  one  stopping  decision  does  not  con for*  vlth  the  formulation  of  Section  3. 
However,  this  will  cause  no  difficulty  since  for  each  1-stopping  problen  we  can 
identify  the  lowest  cost  stopping  decision. 

There  is  some  loss  of  generality  with  our  formulation  of  the  state  space. 
Once  the  system  is  in  state  (1,0),  0  <  i  <_  N,  in  the  Rosenfleld  notation  the 
formulations  are  the  sane.  However,  if  the  initial  state  is  (i,k),  k  >  0,  then 
our  policies  do  not  apply  until  the  first  inspection  or  repair  takes  place. 

In  order  to  apply  the  Regenerative  Stopping  Algorithm,  we  check  Assump¬ 
tions  1-3  for  the  A-stopplng  problem  in  order  to  determine  A.  For  A  < 

let  S  "  {N } .  It  Is  easily  seen  that  Assumption  3  holds.  The  assumption  that 

sup  (max  C(l,a))  Is  bounded  does  not  hold  since  a  can  be  arbitrarily  large. 

K 

However,  we  will  see  that  only  actions  satisfying  £  P*  L.  <  A  <  L„  need 

j-0  lj  J  ^ 

be  considered,  and  for  this  finite  set  of  actions  the  assimptlon  in  question 
does  hold.  Assumption  11  Is  satisfied  by  the  stopping  decision  0-repali 
whose  cost  is  R  -  A.  Thus  A  •  (-  •,L^) .  It  will  not  necessarily  be  the  case 

that  there  Is  a  \  £  A  such  that  V(X)  <  0,  and  for  certain  parameters  it  is 

optimal  never  to  repair.  For  these  problems  the  Regenerative  Stopping  Algorithm 
does  not  apply. 

For  any  A-stopplng  problem  the  set  B  ■  (l:C(l,s)  £  min  [C(l,a) 

a  t  At 

♦  £p*  C(J,s)])  where  C(j,s)  is  the  coat  incurred  using  the  minima  cost  stop- 
J  J 

ping  dec  Is lor.  when  in  state  j  for  the  A-stopplng  problem.  The  C(l,a)  are  the 
continue  costs  for  the  A-stopplng  problem.  The  set  B  is  rather  difficult  to 
determine,  so  Instead  we  consider  D  •  (l:L1>  A).  This  set  is  closed  and  is  a 
subset  of  B  since  if  >  A  then  C(l,a)  >  0  and£  P*jC(J •■)  1  C(i,e)  by 
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Rosen field ' ■  lemma  as  the  C(j,s)  are  increasing  in  j  so  that  i  c  B. 

As  a  preliminary  to  solving  the  X-stopplng  problem  ve  determine  the  mini- 
cost  stopping  decisions  C(i,s).  Let  a(l)  be  the  largest  a,  »  >  0,  such 


that £  <  X.  If  >  X  then  no  a  aatlafiea  the  previous  inequality  and 

by  convention  ve  aet  a(l)  •  -  1.  With  this  convention  the  minimum  cost  stop¬ 
ping  decision  for  state  1  is  a(l)  +  1-repslr.  The  cost  Incurred  using  this 

decision.  C(i,s),  is  <1  ♦...♦  ♦  R  -  <a(i)  +  2)X).  If  a(i)  -  -  1, 

J  3 


then  C(l,s)  ■  R  -  X. 

The  X-stopplng  problem  is  solved  as  follows.  For  i  c  D  •  { i : >  A), 
Theorem  1  applies  and  ve  stop.  For  these  states  a(i)  »  -  1,  and  C(i,A) 

*  C(i,s)  ■  R  -  X.  For  i  t  D  ve  compute  C  by  the  standard  equation  of  opti¬ 
mality  starting  with  the  largest  state  and  going  down  to  0.  Thus  ve  have 

C(i,A)  -  min  (C(i.s),  min  (C(i,a)  ♦  £p*t1C(J.*))).  (I 

0<a<a(i)  j  13 

The  C (l.s)  in  (8)  equals  (L  ♦  £  4  •••  4  1C  4  M  *  («4DX)-  The 

*  j  J  j  J  J 

states  J  such  that  >  0  are  necessarily  larger  or  equal  to  i.  and  C  has 
previously  been  evaluated  for  those  states.  We  only  need  to  consider  action 
a-inapect  such  that  a  £  a(l),  since  C(l,s)  is  increasing  in  a  for  a  >  a(l) 
and  by  Rosenfleld's  lemma,  £p*jC(J,X)  is  increasing  in  a. 


The  initial  lover  bound  X^  on  the  optimal  average  cost  is  obtained  by 
assuming  that  an  inspection  brings  the  system  back  to  atate  0,  so  that  our 

costs  are  as  with  an  inspection,  but  ve  get  the  benefit  of  a  repair.  In  this 


caae 

A  •  min  ♦£p!"1L.4«)/<*H). 

a>0  j  J  3 

Example  3.  We  assume  that  there  are  3  states  0,  1.  and  2. 
pair  ia  40,  and  the  cost  M  of  inspection  is  5;  Iq  •  0,  L^  • 


(9) 

The  cost  R  of  re- 
10,  and  L2  •  20. 


The  transition  matrix  la 
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'.8  .1  .1 

P  •  0  .9  .1 

.0  0  1.0 

We  will  not  try  to  determine  a  X  c  A  ouch  that  V(X)  <  0  and  will  address 
this  problsa  at  stap  O.B.  of  tha  Regenerative  Stopping  Algorithm.  We  calculate 
by  (9).  Tha  minimising  a  la  1  and  XQ  •  (0+5) /2  -  2.5.  The  2.5-stopplng  la 
solved.  The  set  D  -  (1.2)  so  that  C(2.2.5)  •  C(l,2.5)  -  40  -  2.5  *  37.5.  For 
state  0  we  dataralne  a(0)  which  la  0  since  0  <  2.5  and  [.8,.l,.l]  tlaes 
10,10,20]  -  3  >  2.5.  Therefore  C(0,2.5)  -  min  (40  -  5,  0  +  5  -  2.5  +  .8  C(0,2.5) 

+  .2(37.5))  »  35  •  V(2.5).  The  first  ten  In  the  parentheses  Is  the  cost  of 

the  decision  1-repalr  and  the  second  tan  la  the  cost  of  the  decision  0-lnspect. 
Tha  alnlaue  la  achieved  with  the  decision  1-repalr  and  the  average  cost  period 
is  2.5  +  35/2  •  20.  This  presents  a  difficulty  since  20  t  A  and  cannot  be  used. 
We  arbitrarily  choose  a  A  c  A  and  hope  that  V(X)  <  0.  If  V(X)  <  0  does  not  ob¬ 
tain,  we  will  try  a  larger  X  e  A  .  This  arbitrarily  chosen  X  la  17.5  and  we 

solve  the  17. 5-stopplng  problem.  0  •  {2}  so  that  C(2.17.5)  -  40  -  17.5  ■  22.5. 

For  state  1  a(l)  equals  13.  Tha  minimizing  action  for  state  1  Is  to  Inspect 
after  5  periods.  For  state  0  o(0)  equals  14.  The  minimizing  action  for  state 
0  Is  to  Inspect  after  5  periods.  V(17.5)  •  0(0,17.5)  “  -  84.816  and  the  ex¬ 
pected  time  until  stopping  Is  12.805.  Thus  17.5  can  play  the  role  of  X. 

The  next  X  used  was  a  weighted  average  of  .9  and  .1  XR  and  equaled 
9.681.  This  was  lower  than  the  storage  cost  of  the  A^-etopplng  problem,  17.5 
-  (84.816/12.805).  This  stopping  problem  la  solved  and  V(9.481)  ■  0(0,9.481) 

•  -  .107. 

Tha  next  X  used  is  9.448.  The  stopping  problem  Is  solved  and  V(9.448) 

•  0(0,9.448)  •  .1327.  The  optimal  policies  for  the  9.481  and  9.448  stopping 
problems  are  the  same  so  that  Proposition  2  can  be  used  to  confirm  optimality. 
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This  policy  is  O-rspsir  for  states  1  end  2  end  2-inspect  for  state  0.  For 
X  -  9.448,  C(2,9.448)  -  G(l,9.448)  -  30.552.  Por  state  0  a(0)  -  4  and  the 
optlaal  decision  is  2-lnspect.  The  value  of  C(0,9.448)  ■  .1327  was  obtained 
by  solving  C(0,9.448)  •  -  19.8445  ♦  5  +  .488  (30.552)  ♦  .512  G(0,9.448). 
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